-
Notifications
You must be signed in to change notification settings - Fork 249
Feat: Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary #3824
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR fixes the CNI telemetry service initialization issue where it was failing to start due to an empty Application Insights instrumentation key. The fix ensures that the telemetry service uses the proper AI configuration from the CNS configuration instead of an empty basic configuration.
- Updates the
startTelemetryService
function to accept CNS configuration and properly initialize AI telemetry settings - Adds validation to check for Application Insights instrumentation key before starting the telemetry service
- Introduces proper telemetry configuration mapping from CNS settings to AI config
cns/service/main.go
Outdated
@@ -94,6 +94,7 @@ const ( | |||
// Service name. | |||
name = "azure-cns" | |||
pluginName = "azure-vnet" | |||
aiPluginName = "AzureCNI" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a typo in the PR title 'Telemtry' which should be 'Telemetry'. While this constant name is correct, ensure consistency in naming throughout the codebase.
Copilot uses AI. Check for mistakes.
cns/service/main.go
Outdated
if cnsconfig.TelemetrySettings.AppInsightsInstrumentationKey != "" { | ||
err = tb.CreateAITelemetryHandle(aiConfig, ts.DisableTrace, ts.DisableMetric, ts.DisableEvent) | ||
} else { | ||
logger.Printf("No Application Insights key provided for CNI telemetry service") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should check other way and remove else?
if cnsconfig.TelemetrySettings.AppInsightsInstrumentationKey == "" {
logger.Printf("No Application Insights key provided for CNI telemetry service")
return
}
err = tb.CreateAITelemetryHandle(aiConfig, ts.DisableTrace, ts.DisableMetric, ts.DisableEvent)
Discussed in the team meeting today - we may want to consider doing telemetry for CNI directly in CNI rather than CNS, to account for scenarios where CNI can't talk to CNS. |
2389f47
to
df332ec
Compare
dde3441
to
b59b87b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on my understanding, this pr replaces azure-vnet-telemetry binary on the node with a sidecar that does the same thing. So CNI would send AI events to this sidecar via a socket if it is present (rather than the azure vnet telemetry long running process). I believe right now azure-vnet-telemetry is still on the vhd (?) so CNI will still attempt to start that azure vnet telemetry binary service. Can you check that there's no race or issue with both CNS and Azure CNI trying to start the telemetry service at the same time?
CNI_TELEMETRY_SIDECAR_DIR = $(REPO_ROOT)/cns/cni-telemetry-sidecar | ||
CNI_TELEMETRY_SIDECAR_BUILD_DIR = $(BUILD_DIR)/cni-telemetry-sidecar | ||
CNI_TELEMETRY_SIDECAR_AI_ID = $(CNI_AI_ID) # Reuse CNI AI ID | ||
CNI_TELEMETRY_SIDECAR_VERSION = $(CNS_VERSION) # Version follows CNS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we move the version to where all the other version variables are?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also if you're intending this to be like azure ip masq merger with its own release versions etc. you'll need to add the exclusion to the tag above: ACN_VERSION ?= $(shell git describe --exclude "azure-ip-masq-merger*" --exclude "azure-ipam*" --exclude "dropgz*" --exclude "zapai*" --exclude "ipv6-hp-bpf*" --tags --always)
@@ -123,7 +128,7 @@ all-binaries-platforms: ## Make all platform binaries | |||
|
|||
# OS specific binaries/images | |||
ifeq ($(GOOS),linux) | |||
all-binaries: acncli azure-cni-plugin azure-cns azure-npm azure-ipam azure-ip-masq-merger ipv6-hp-bpf | |||
all-binaries: acncli azure-cni-plugin azure-cns azure-npm azure-ipam azure-ip-masq-merger ipv6-hp-bpf cni-telemetry-sidecar | |||
all-images: npm-image cns-image cni-manager-image azure-ip-masq-merger-image ipv6-hp-bpf-image |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it looks like you're making a new image-- if this is like a new sidecar image/container image, there are other dockerfiles/scripts you may need to edit (ex: for the signed pipeline). would the image target be added here?
@@ -275,6 +284,7 @@ CNI_IMAGE = azure-cni | |||
CNS_IMAGE = azure-cns | |||
NPM_IMAGE = azure-npm | |||
AZURE_IP_MASQ_MERGER_IMAGE = azure-ip-masq-merger | |||
CNI_TELEMETRY_SIDECAR_IMAGE = azure-cni-telemetry-sidecar |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not sure if you want to keep the variable naming consistent? it's CNI_TELEMETRY here but the image name is azure-cni-telemetry-sidecar-- is the "sidecar" part of the name necessary (here and elsewhere)? If it's a container image I feel like it's already implied that it'll be running in a sidecar or similar.
# Create a CNI Telemetry Sidecar archive for the target platform. | ||
.PHONY: cni-telemetry-sidecar-archive | ||
cni-telemetry-sidecar-archive: cni-telemetry-sidecar-binary | ||
ifeq ($(GOOS),linux) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this binary is on windows and linux would we omit this linux check?
|
||
// Validate batch interval for optimal Azure ingestion | ||
if config.TelemetrySettings.TelemetryBatchIntervalInSecs <= 0 { | ||
logger.Printf("Warning: Invalid telemetry batch interval, using default") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does ai under the hood use the default automatically if it's not valid, or should this also modify the value to the default (also applies to above conditional)?
cns/cni-telemetry-sidecar/main.go
Outdated
flag.Parse() | ||
|
||
// Initialize logging for the CNI telemetry sidecar | ||
logger.InitLogger("azure-cns-cni-telemetry-sidecar", 1, 1, "/var/log/azure-cns-telemetry") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the naming be the same-- would it be cns or cni telemetry? Like below the logger prints that it's azure cni telemetry sidecar but the log is to the file azure-cns-telemetry
const ( | ||
// CNI telemetry constants aligned with azure-vnet-telemetry | ||
cniTelemetryAppName = "azure-vnet-telemetry" | ||
cniTelemetryVersion = "1.0.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: do we want to start with 0.0.x release train or 1.0.x?
b59b87b
to
fff55be
Compare
fff55be
to
6e4257f
Compare
Adding a CNI Telemetry sidecar to CNS pod to replace the azure-vnet-telemetry binary.
Requirements:
Notes: